Hll 
unll 


HHBi 
1  IHHi 

llKr 

lH  H 

1111111 

— msssm 


9& 


villi  m 

HHHH1 
■BHHBil 

ggg       illilSSiI 

W®m  Mil 
lipHiflil 

ffigSm      WW®. 


LIBRARY  OF  THE 

UNIVERSITY  OF  ILLINOIS 

AT  URBANA-CHAMPAIGN 

510.84 

IJHor 
"0.698-702. 
cod.  2- 


t 


The  person  charging  this  material  is  re- 
sponsible £or  its  return  to  the  library  from 
which  it  was  withdrawn  on  or  before  the 
Latest  Date  stamped  below. 

Theft,  mutilation,  and  underlining  of  books 
are  reasons  for  disciplinary  action  and  may 
result  in  dismissal  from  the  University. 

UNIVERSITY    OF     ILLINOIS     LIBRARY    AT    URBANA-CHAMPAIGN 


RECTJ 


L161  —  O-1096 


u,  *jti       Report  No.  UIUCDCS-R-75-700 


NSF  -  OCA  -  GJ -36936  -  000008 


A  PARALLEL  QR-ALGORITHM  FOR  TRIDIAGONAL  SYMMETRIC  MATRICES 


by 


Ahmed  H.  Sameh  and  David  J.  Kuck 


February  1975 


DEPARTMENT  OF  COMPUTER  SCIENCE 
UNIVERSITY  OF  ILLINOIS  AT  URBANA-CHAMPAIGN 


URBANA,  ILLINOIS 


Digitized  by  the  Internet  Archive 
in  2013 


http://archive.org/details/parallelqralgori700same 


* 
A  Parallel  QR-Algorithm  for  Tri diagonal  Symmetric  Matrices 


by 


Ahmed  H.  Sameh   and  David  J.  Kuck 


This  work  was  supported  in  part  by  US  NSF  GJ-36936;  and  in  part  by  the 
Advanced  Research  Project  Agency  of  the  Department  of  Defense  under  Contract 
No.  DAHC04-72-C-0001 . 

** 

Department  of  Computer  Science  and  the  Center  for  Advanced  Computation, 

University  of  Illinois,  Urbana,  Illinois  61801. 

*** 

Department  of  Computer  Science 

University  of  Illinois,  Urbana,  Illinois  61801. 


Abstract 

We  show  that  if  the  size  of  the  tridiagonal  matrix  in  any  given 
iteration  is  n,  then  the  parallel  QR-algorithm  requires  0(log?n)  steps 

with  0(n)  processors  per  iteration  and  no  square  roots.  This  results  in 
a  speedup  of  0(n/log2n)  and  an  efficiency  of  0(l/log?n). 
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1 .  Introduction 

Reinsch  [1]  proposed  a  stable  QR  algorithm  for  finding  the  eigen- 
values of  symmetric  tri diagonal  matrices  that  requires  no  square  roots. 
In  this  paper  we  present  a  version  of  his  algorithm  that  is  suitable  for 
parallel  computers.  We  give  upper  bounds  on  the  time  and  number  of  proces- 
sors required,  per  iteration. 

Throughout  this  paper  we  assume  that  any  number  of  processors  may 
be  used  at  any  time,  but  we  will  give  bounds  on  this  number.  Each  processor 
can  perform  any  of  the  four  arithmetic  operations  in  one  time  step,  and 
there  are  no  memory  or  data  alignment  time  penalties.  We  also  give  the 
following  definitions.  If  p  >^  1  processors  are  used,  we  denote  the  computa- 
tion time  by  T  .  The  speedup  of  a  parallel  algorithm  which  uses  p  processors 
over  a  serial  algorithm  is  defined  by  S  =  T,/T  >_  1,  and  the  corresponding 

efficiency  of  the  computation  is  given  by  E  =  S  /p  <_  1. 

We  show  that  if  the  size  of  the  tri diagonal  matrix  in  any  given 
iteration  is  n,  then  the  parallel  QR  algorithm  requires  0(log2n)  steps  per 

iteration  (and  no  square  roots)  using  0(n)  processors.  This  results  in  a 
speedup  of  0(n/log2n)  and  an  efficiency  of  0(l/log?n).  This  algorithm 

depends  on  a  parallel  scheme  developed  by  Chen  and  Kuck  [2],  for  solving 
linear  triangular  systems. 


2.  The  Algorithm 

One  iteration  of  the  QR  algorithm  can  be  expressed  in  the  form 

[3], 

Q(A-kl)  =  R 
A  =  RQt 


(1) 


where  k  is  an  origin  shift,  Q  is  orthogonal,  R  is  upper  triangular,  and  we 
assume  that  the  tridiagonal  matrix  A  is  of  order  n  =  2m,  for  a  positive 
integer  m.  Note  that  the  tridiagonal  matrix  A  is  similar  to  (A-kl)  rather 
than  A.  Let  the  diagonal  elements  of  (A-kl)  be  denoted  by  a.  (i  =  1,  2,  ..., 
n)  and  the  off-diagonal  elements  by  b.  (i  =  2,  3,  ...,  n).  The  orthogonal 

matrix  Q  is  chosen  as  the  product  of  (n-1)  plane  rotations,  i.e., 

n  0    0  1 4- 1 

Q  =  pn-i  P2  PV  where  pi    rotates  rows  j  and  j+1  annihilating  the 

element  in  the  position  (j+1,  j), 


pj+l 


row  j 


-s 


Therefore,  if  P, 


P*  P^  (A-kl)   is  given  by, 


L3 
^3 


Pr-1 


Y+l 


'r+1 


cr-l   br+l 


'r+1 


Y+2 


then  it  follows  by  induction  that, 


cr  =  gr/Pr 


sr  =  Wpr 


r  =  1 ,  2,   .. . ,  n  -  1 


Vl  "  cr-l  cr  br+l  +  sr  ar+l 


(2) 


where , 


9r=  cr-1  ar  -  cr_2  sr-1  br    r  =  1,  2,  ...,  n 


(3) 


in  which  cQ  =  1,  sQ  =  0.  From  (2)  and  (3)  we  obtain  the  recurrence  relation, 

r  =  2,  3,  ...,  n  -  1       (4) 


cp=c1a-c0s,b 
r  Kr    r-1  r    r-2  r-1  r 


Let  w  =  c   n  p.,  then  (4)  reduces  to  the  second  order  linear  recurrence 
r    r  i=l  ] 


relation, 


wQ=  1,  W]  -0l 


wr  =  ar  Vl  -   br  wr_2 


r  =  2,  3,  ...,  n  -  1 


Solving  (5)  is  equivalent  to  solving  the  lower  triangular  system  of 
equations, 


Lw  =  e 


1 


where, 


w  =  (wQ,  w^  ...,  wn_1)  , 


and 


ij 


"aj 


ii 


0 


1  =  J 

i  =  j  +  1 

i  =  j  +  2 
otherwise 


2    2  2       2 

Using  the  identity  c  +  s  =  1,  and  defining  z     =     n  p.  ,  r 

r         r  r   ^  l 

n  -  1 ,  we  obtain  the  first  order  linear  recurrence  relation 

2    2   , 
z0  =  w0  =  ] 


=  1,  2, 


2  -  k2    2   .  2 
2r  "  br+l  2r-l  +  wr 


r  =  1,  2,  ...,  n  -  1 


This  is  equivalent  to  the  lower  triangular  system, 


Lz  =  w 


where, 


~t     2   2       2 

w  =  (wQ,  wr  . ..,  wn-1)  , 

~t  _  /  2   2       2  x 

Z     VZq>  Z]  »  •  •  •  »  Zp_]  /  > 

r 


1         1  -  J 


and     tlj  = 


-b?     i  =  j  +  1 


0      otherwise 


Note  that  the  squares  of  the  subdi agonal  elements  rather  than 
the  elements  themselves  are  the  given  data.  So  squaring  these  elements 
of  the  original  matrix  will  cost  only  one  step  and  (n-1)  processors, 
which  is  negligible  compared  to  the  time  of  finding  all  the  eigenvalues 
of  the  tri diagonal  matrix. 

The  linear  systems  (6)  and  (8)  are  solved  sequentially.  Chen 
and  Kuck  [2]  have  shown  that  any  m-th  order  linear  recurrence  system  of 
n  equations,  or  any  unit  lower  triangular  system  of  bandwidth  m  +  1,  can 
be  solved  in 

1    2 
Tp  =  (2+log2m)log2n  -  ■2-(lo92m+'lo92m) 

2 
steps  with  p  =  0(m  n)  processors.  Specifically,  the  system  (6)  can  be 

solved  in  (31og2n  -  1)  steps  with  no  more  than  4n  processors.  Once  w  is 

obtained,  forming  the  right-hand  side  w  in  (8)  requires  one  step  with  (n-1) 
processors,  and  the  system  itself  can  be  solved  in  21og2n  steps  with  (n-1) 

processors.  Thus,  solving  (6)  and  (8)  requires  T   =  51og9n  steps 

Pi      L 


with  p,  =  4n  processors.  From  w  and  z  we  can  obtain, 


A  -  <H-i  • 


2  _  ..2,  2 


cT  =  w/z  ,       j  =  1,  2,  ...,  n  -  1 
j    j  j 


sj  =  ii/pj  - 


and,     p^  =  n2/!^ 


2 


in  which  n  =  (an  wn-1  -  t/  wn_2)  and  where  we  have  used  (3)  and  (9). 

It  can  be  shown  [4]  that  the  orthogonal  matrix  Q  is  lower 

Hessenberg  in  which  Q..  =  c.  -,   c.  =  w.  -,  w./z..  ■,  zi     (j  =  1 ,  2,  .. .,  n  -  1 
3         33         J-'  J    J-1  J  J"1  3 

Q   =  c  ,  ,  and  Q.  .+1  =  s.  .  Therefore,  the  elements  of  the  tridiagonal 
matrix  A  =  RQ  are  given  by, 


a.  =  E.   t  •  +  a.,  n  s . 
3         3     3         J+l  3 

n2     c2 


b?.i  =  pLi  s?  »  j  =  1,  2,  ...,  n  -  1 


2 

and      a  =  £  a  -  c  -■  b 
n   ^n  n   *n-l  n 


where, 


Tj  ■  (£,  ♦  Pj2)  , 

5J  =  wj-l  wj/zj  •       J-l.  2.  ....  B-l 


and      cn  =  wn-l/zn-l 


We  show  in  Figure  1  that  the  elements  of  A  can  be  evaluated  in  T   =4 

steps  using  p2  =  3n  processors.  Thus,  the  total  time  required  is  given  by 

T=T   +T   =4+  51og0n  steps  with  p  =  4n  processors.  Reinsch  [1]  has 
p    P1    p2        32 

shown  that  on  a  serial  computer  the  time  required  for  one  iteration  is 
T,  =  lln  steps.  Hence,  our  parallel  algorithm  yields  a  speedup 

S     a  lln/Blog^n  with  an  efficiency  E     ■  ll/^Olog^n. 

The  linear  systems  (6)  and  (8)  can  be  extremely  ill -conditioned. 
For  example,  for  Wilkinson's  matrix  WZ,  [5],  a.  =  11  -  i  (i  =  1,  2,  ...,  21) 

and  b.  =  1,  the  condition  number  of  L,  in  (6),  in  the  first  iteration  is  of 

the  order  10"  .  We  simulated  the  above  algorithm  on  an  IBM  360/75,  and  in 
spite  of  such  high  condition  number  we  obtained  all  the  eigenvalues  correct 
to  at  least  9  decimal  places.  Wilkinson  [6]  has  shown  that  the  computed 
solution  of  a  triangular  system  has  low  relative  errors.  Specifically, 
system  (8),  (w.  >  0,  £..  >  0,  £..  <_0  i  /  j),  will  have  low  relative  error 

in  the  solution  no  matter  how  ill-conditioned  L  may  be. 


aj  *  WM    wj       ZJ 


(bj+i  +  zj    zm>  *  Vi    Vi 


<4  z;V 


2   x-1 


b!+1""j+i  'jj+izJ'   (zj  zJ-i> 


j  =  1,  2,   ...,  n  -  1 


j  =  1,  2,    ....  n  -  2 


n-2       wn-l     'n-1 


Figure  1 
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